Variable bitrate video coding method and corresponding video coder.

ABSTRACT

The variable bitrate coding method according to the invention comprises an iterative process including a first analysis pass and a second prediction pass and followed by a last control step for adjusting said stepsize with respect to said target bitrate. According to the invention, a picture re-arrangement step is provided between the analysis and prediction steps of one iteration, in order to encode with an improved quality the picture sequence. Application: MPEG-2 encoders for storage media with limited capacity.

[0001] The present invention relates to a variable bitrate video codingmethod including an iterative process that comprises a first analysisstep, for coding a bitstream corresponding to a picture sequence with aconstant quantization stepsize, and a second prediction step, forpredicting the quantization stepsize which must then be used to codesaid bitstream according to a predetermined target bit rate, and isfollowed by a final control step, for adjusting the stepsize withrespect to said target bit rate. The invention also relates to acorresponding video coder allowing to carry out said method.

[0002] As described in the document <<MPEG Video coding: a basictutorial introduction >>, S. R. Ely, BBC Research and DevelopmentReport, BBC-RD-1996/3, pp.1-10, MPEG activities started in 1988 with theaim of defining standards for digital compression of video and audiosignals. The first goal was to define a video algorithm for digitalstorage media such as the CD-ROM (Compact Disc Read-Only Memory), butthe resulting standard was also applied in the Interactive CD system(CD-I). Allowing transmission and storage of picture data at bit ratesin the range of 1 to 15 Mbits/s, this standard is based on a datacompression achieved by using a block-based motion compensation for thereduction of the temporal redundancy and a discrete cosinetransformation (DCT) for the reduction of the spatial redundancy.

[0003] With conventional CD standards such as CD-I and CD-ROM, thetransmission bit rate is fixed and pictures can therefore only be codedat a constant bit rate. New standards such as the Digital Versatile Disc(DVD) allow for transmission of data at a variable bit rate (VBR):complex scenes can be coded at a higher bitrate than scenes containingless information, in order to maintain a constant quality.

[0004] An object of the invention is to propose a VBR video codingmethod allowing to obtain such a constant quality of the coded sequence,with a minimal bit rate of the encoder output bitstream.

[0005] To this end the invention relates to a VBR video coding methodsuch as defined in the preamble of the description and which is moreovercharacterized in that it comprises, between the analysis and predictionsteps of one iteration, a picture re-arrangement step.

[0006] Such a picture re-arrangement step preferably comprises in seriesa first scene change detection sub-step, a second allocation sub-step,and a third optimum placement sub-step.

[0007] In a preferred implementation, said scene change detectionsub-step comprises a correlation operation, carried out betweensuceeding pictures of the sequence, and a decision operation, forindicating the possible occurrence of a scene change.

[0008] Another object of the invention is to propose a VBR video coderallowing to carry out said coding method.

[0009] To this end the invention relates to a variable bit rate videocoder comprising a first coding branch, a second prediction branch, anda control circuit provided for carrying out the implementation of thefollowing operations:

[0010] a coding operation of a bitstream corresponding to a picturesequence with a constant quantization stepsize;

[0011] a prediction operation, for an estimation of the quantizationstepsize allowing to code said bitstream according to a specified targetbitrate;

[0012] at least one repetition of said operations;

[0013] a final controlling operation, for adjusting the stepsize withrespect to said target bitrate;

[0014] characterized in that said control circuit is provided forimplementing, between the first coding operation and the firstprediction operation, a picture re-arrangement operation.

[0015] The advantages of the invention will be now explained in a moredetailed manner in relation with the following description and theaccompanying drawings, in which:

[0016]FIG. 1 shows some pictures of a group of pictures,

[0017] and FIGS. 2 and 3 illustrate for such a typical group of picturesthe difference between the display order of the pictures and theirtransmission order;

[0018]FIGS. 4 and 5 show respectively the main steps of the codingmethod according to the invention and a corresponding architecture ofvideo coder allowing to implement said method;

[0019]FIG. 6 shows some pictures and the associated motion vectors for asequence with a scene change between two successive B pictures;

[0020]FIGS. 7 and 8 show values of two detection coefficients DCL andDCR computed in order to evaluate quantitatively the motion vectorstatistics used by the macroblocks of the B picture for the threepossible positions of a scene cut with respect to the two B pictures ofan IBBP or a PBBP group of pictures;

[0021]FIG. 9 shows values of a similar detection coefficient DCP for Ppictures;

[0022]FIG. 10 illustrates the performance of the scene charge detectionmethod using only B pictures;

[0023]FIG. 11 gives the decision values of the Viterbi algorithm used tosolve the problem of optimal allocation of groups of pictures (or GOPS);

[0024]FIG. 12 illustrates an example of weighting function for the GOPsize (as weighting values are used to evaluate the size of the GOPs);

[0025]FIG. 13 shows a control loop used for the implementation of thefinal control step of said coding method.

[0026] Before describing the coding method according to the invention,some basic principles of the MPEG-2 video standard may be recalled. Theflexibility of this standard, intended to support a wide range ofpossible applications, is obtained thanks to the definition of profilesand levels allowing to suit all of the application requirements. Aprofile is a subset of the MPEG-2 standard intended to support only thefeatures needed by a given class of application, while a level defines aset of imposed constraints on parameters of the bitstream.

[0027] The basic steps of the MPEG-2 compression, applied to colourpictures consisting of three components (Y,U,V), concern pictures thatare divided into small subsections, or macroblocks, themselvesconsisting of luminance and chrominance blocks. These steps are motionestimation and compensation (based on 16 pixels×16 lines macroblocks),discrete cosine transformation (based on 8 pixels×8 lines blocks),run-length coding.

[0028] Three types of pictures are defined. Intra pictures (or Ipictures) are coded without reference to other pictures, predictivepictures (or P pictures) are coded using a motion-compensated predictionfrom a past I or P picture, and bidirectional-predictive pictures (or Bpictures) use both past and future I or P pictures for motioncompensation. The motion information is given in the form of motionvectors obtained by implementation of a block-matching search (in whicha large number of trial offsets are tested in the coder and the best oneis selected on the basis of a measurement of the minimum error betweenthe block being coded and the prediction).

[0029] As indicated in FIG. 1 that illustrates how, on the basis of themotion vectors, the P and B pictures are defined, the different picturestypically occur in a repeating sequence which is termed, as said above,a group of pictures, or GOP, and consists of an I picture and allsucceeding pictures until the next I picture occurs. A typical GOP isshown in display order in FIG. 2 (the black arrows correspond to forwardpredictions and the white ones to backward predictions, the sequencesand the predictions repeating periodically) and in transmission order inFIG. 3 (P4, P7, I10, P13 designate the re-ordered frames), said ordersbeing different to enable at the decoding side backward predictions fromfuture pictures.

[0030] A regular GOP structure can be described with two parameters, Nand M. The parameter N, defined as the size of a GOP, is, as shown inFIG. 2, the number of pictures of said GOP, i.e. the number of picturesbetween two I pictures plus one. The parameter M is the spacing of Ppictures, or (which is the same) the number of adjacent B pictures plusone. In the illustrated example of FIGS. 1 to 3, M=3 and N=9. Obviouslyother combinations are possible: Picture display order N M IPPPPPIPP  61 IBPBPBPBI  8 2 IBBPBBPBBPBBIBBP 12 3

[0031] N and M being chosen independently from each other.

[0032] After a decision about the kind of macroblock compensation hasbeen taken, the prediction error for each pixel of the concernedmacroblock is obtained by subtracting the estimated macroblock from theoriginal one. A DCT is then performed on the prediction error, for ablock of 8×8 pixels (hence six DCT transforms are determined for eachmacroblock: four for the luminance component, two for the chrominancecomponents), and the frequency components thus obtained are quantized.The quantization stepsize determines the bitrate and the distorsion ofthe decoded image: if the quantization is coarse, few bits are needed tocode a picture, but the final quality is low, while, if the quantizationstepsize is fine, many bits are needed to code the picture, but thequality is high. As the human eye is less sensible to the higherfrequencies than to the lower ones, it is advantageous to use coarserquantizers for the high frequency components (in fact, in order toachieve the frequency dependent quantization, a weighting matrix isapplied to a basic macroblock quantization parameter: a lot ofcoefficients, especially those at high frequencies, is equal to 0 aftersaid weighted quantization).

[0033] Each block is then zigzag scanned and the obtained list is coded.The run-length coding is done by determining a pair (A, NZ) where Adesignates the number of consecutive zeros (0 to 63) and NZ theamplitude of the following non-zero coefficient. A variable length codeis then assigned to this pair (A,NZ), depending on the frequency ofoccurrence of this pair (a combination (A,NZ) which is common isassigned a short variable length code, whereas a pair which is lessfrequent is assigned a long one).

[0034] The functions hereabove described are summarized for example inthe document <<Hybrid extended MPEG video coding algorithm for generalvideo applications>>, C. T. Chen and al., Signal Processing: ImageCommunication 5 (1993), pp.21-37, part 2.4, which includes the scheme ofa generalized MPEG-2 coder. The aim of the VBR coding method nowproposed is then to use the information that is gained from precedingcoding steps carried out in several successive analysis passes toperform an adaptive allocation of the picture types, which allows tominimize the size of the final bitstream in order to fit exactly on astorage medium with a fixed capacity (like a DVD). A diagram of saidmethod is given in FIG. 4, and an architecture of video coder withpicture rearrangement allowing to implement said method is illustratedin FIG. 5.

[0035] The coding method is divided into four steps 41, 42, 43 and 44.The first step 41 is an analysis one, in which a picture sequence iscoded with a constant quantization stepsize Qc (and therefore with aconstant quality). At the end of this step, a regular MPEG-2 compliantbit stream has been generated, but the average bit rate of the wholesequence thus processed (i.e. the quotient of the total number of codedbits over the sequence by the total number of pictures in thatsequence), unknown before the end of said step, does not fulfil therequired constraint of a specific size of the bitstream.

[0036] The second step 42 is a picture re-ordering one, performed afterthe analysis step 41. This re-ordering step may be itself subdividedinto three sub-steps 110 to 130. The task of optimizing the pictureallocation can be considered as comprising two separate parts. The firstone is the improvement of the placement of the I pictures, which isequivalent to an optimization of the GOP allocation, while the secondone is the most efficient placement of B and P pictures.

[0037] The first and second sub-steps 110 and 120 constitute the firstpart of said optimizing task. It is clear that I pictures, that do notexploit the temporal correlation between successive pictures of asequence, are the most costly ones in terms of bit rate. On the otherhand, they are necessary to allow random access to a sequence, andrandom access is important for many applications. Moreover, since quickrandom access is often wanted, it must be taken care of that a givenmaximum distance between I pictures is not exceeded (for instance, atmost 12 pictures). When a scene change occurs, the pictures before andafter the cut (left and right pictures) are uncorrelated. The motioncompensation is not well performed for the first P picture in the newscene, and its bit rate is therefore approximately that of an I picture.Such an I picture can then be placed instead of said P picture withoutan extra cost in bandwidth. The strategy for an optimal placement of Ipictures must then allocate the I pictures at the beginning of a newscene whenever possible.

[0038] The first sub-step 110, which is a scene change detectionsub-step, allows for such an allocation. In order to detect scenechanges, the correlation of succeeding pictures of the sequence isexamined (preferably after the motion compensation): if two adjacentpictures are almost uncorrelated, it is likely that a new scene beginswith the second one. In a basic MPEG-2 coding process, severalparameters give some information about the correlation betweensuccessive pictures:

[0039] the complexity of a P or B picture the connection betweencomplexity and correlation is however not always verified (a low bitrate being sometimes due to a high correlation with the referencepicture or to a low intra complexity, with a totally black pictures, forinstance)

[0040] a better estimation of the correlation (although more expensivein computation time) is obtained by comparing the macroblocks MB of apictures with their reference macroblocks (always provided by the motioncompensation unit, whatever the type of block coding: intra or inter):the comparison can be done by means of a computation of thesquared-error distorsion d(MB), which is for example given by therelation (1): $\begin{matrix}{{d({MB})} = {\frac{1}{255}{\sum\limits_{i = 0}^{i = 255}\quad \left( {{P(i)} - {M(i)}} \right)^{2}}}} & (1)\end{matrix}$

[0041] where P(i) is a pixel of the analyzed macroblock MB and M(i) is apixel of the reference macroblock;

[0042] the motion estimation being macroblock-oriented, a picture ispredicted using several motion compensation options: Picture type Motioncompensation option I Intra (i.e. no motion compensation) P Intra PForward P Not compensated B Intra B Forward B Backward B Interpolated

[0043] and the motion compensation statistics may convey informationabout picture correlation: if most macroblocks are intra coded, thecorrelation with the reference picture(s) is low and vice-versa.

[0044] The implemented embodiment uses said motion compensationstatistics for the detection of scene changes, in the case where only Bpictures are used for instance, as seen in FIG. 6 which shows picturesand motion vectors for a sequence PBBP with a scene change illustratedbetween the two B pictures (broken arrows indicate that less macroblocksof the concerned reference picture are used to predict the dependentpicture, the correlation being lower). As the scene cut occurs betweenthe two B pictures, the first one uses almost only the preceding Ppicture as reference picture, since it is basically uncorrelated withthe following P picture. Similarly, the second B picture is almostuncorrelated with the preceding P picture and uses almost only thefollowing P picture as reference.

[0045] A scene change can be placed before, between, or after twoadjacent B pictures, in a group of three pictures such as illustrated(PBB, or IBB). The table given hereunder shows, for the three possiblepositions of the scene cut, the motion compensation that is used by mostmacroblocks of the first B picture and by most macroblocks of the secondpicture: POSITION FIRST PICTURE SECOND PICTURE before backward backwardbetween forward backward after forward forward

[0046] (the direction of the motion compensation being “seen” from thepoint of view of the B pictures).

[0047] To evaluate quantitatively the motion vector statistics mentionedhereabove, two detection coefficients DCL and DCR are computed (DC for“detection coefficient”, L and R for “left” and “right”, MC for “motioncompensated”). $\begin{matrix}{{DCL} = \frac{({intra}) + \left( {{backward}\quad {MC}} \right)}{({forward}) + \left( {{interpolated}\quad {MC}} \right)}} & (2) \\{{DCR} = \frac{({intra}) + \left( {{forward}\quad {MC}} \right)}{({backward}) + \left( {{interpolated}\quad {MC}} \right)}} & (3)\end{matrix}$

[0048] For P pictures, the detection coefficient can be similarlydefined: $\begin{matrix}{{DCP} = \frac{{intra}\quad {MC}}{\left( {{forward}\quad {MC}} \right) + \left( {{not}\quad {MC}} \right)}} & (4)\end{matrix}$

[0049] As shown in FIGS. 7 and 8 which indicate values of the left andright detection coefficients DCL and DCR for successive B pictures,scene cuts clearly correspond to spikes. Similarly, the detectioncoefficients DCP for P pictures are shown in FIG. 9 (obviously, theinformation conveyed by the motion vectors of P pictures is much lessreliable than that provided by the B pictures).

[0050] In case (for example) of a scene change that occurs after the twoB pictures, i.e. between the second B picture and the followingreference picture (on the right side of said second B picture), fewmacroblocks of the analyzed B picture are backward compensated orinterpolated, since the correlation between said B picture and thefollowing reference picture is low, and a majority of them is intra orforward motion compensated : the value of the detection coefficient DCRis therefore high, whereas the value of the detection coefficient DCL isnot increased (on the contrary, in case of a scene change on the leftside of the first B picture, between the previous reference picture andsaid B picture, DCL has a high value and DCR remains small, while bothDCL and DCR have a small value if no scene change occurs in the block ofM pictures). In fact, in order to have a single, symmetric indicator ofscene changes, the difference DDV between both detection values iscomputed, which yields:

DDV=DCL−DCR  (5)

[0051] that is to say: $\begin{matrix}{{DDV} = {\frac{({intra}) + \left( {{backward}\quad {MC}} \right)}{({forward}) + \left( {{interpolated}\quad {MC}} \right)} - \frac{({intra}) + \left( {{forward}\quad {MC}} \right)}{({backward}) + \left( {{interpolated}\quad {MC}} \right)}}} & (6) \\{{DDV} = \frac{\left( {{NBMB}\quad {per}\quad {picture}} \right)*\left( {{{backward}\quad {MC}} - {{forward}\quad {MC}}} \right)}{\left( {{forward} + {interpolated}} \right)*\left( {{backward} + {interpolated}} \right)}} & (7)\end{matrix}$

[0052] This difference DDV, called motion compensation ratio, iscomputed for each B picture of each group of three pictures IBB or PBB.As it is assumed that there is no more than one scene change for eachgroup, a decision value DVL measuring the probability of such a scenechange is determined by adding the absolute values of DDV for the twoadjacent B pictures: $\begin{matrix}{{DVL} = \frac{{{{DDV}(1)}} + {{{DDV}(2)}}}{2}} & (8)\end{matrix}$

[0053] the numbers 1 and 2 indicating whether the ratio is related tothe first or to the second of the two succeeding pictures. The exactposition of the scene change with respect to the bidirectionnel picturescan then be determined by looking at the signs of the two ratios

[0054] if DDV(1) and DDV(2)>0, the scene change has occurred before thefirst B picture;

[0055] if DDV(1)>0 and DDV(2)<0, the scene change has occurred betweenthe two B pictures

[0056] if DDV(1) and DDV(2)<0, the scene change has occurred after thetwo B pictures.

[0057] The performance of the scene change detection method using only Bpictures is shown in FIG. 10. One decision value for each IBB or PBBgroup is computed, and it may be observed that

[0058] the spikes of the decision values are at the same position as thereal scene cuts;

[0059] the noise around the macroblock n°50 is caused by light effectsin the sequence, which disturb the motion estimation algorithm andtherefore the motion compensation dependent scene cut prediction;

[0060] the last part of the examined video sequence is basically astanding image (the pictures are almost identical) it is thereforeundefined which motion compensation is used, since the referencemacroblock is the same for all compensation types, and the decisionvalues have consequently a non-negligible value although no scene changeoccurs (to reduce this risk of wrong scene change predictions, it canthen be useful to consider the motion vectors statistics of P picturesin addition to that of B pictures : if the detection coefficient for a Ppicture is low, no scene change has occurred for the three precedingpictures).

[0061] The second sub-step 120 is a GOP allocation sub-step. An optimalallocation of a GOP is determined by two conflicting aims:

[0062] (a) the first one is to select a preferred size for the GOP if aGOP is too small, bits are wasted because more costly I pictures areallocated than necessary, while random access is impaired if a GOP istoo big;

[0063] (b) the second one is to match the start picture of a GOP withthe position of a scene change.

[0064] Hence the problem of GOP allocation is to arrange the GOPs in anoptimal way while meeting the constraints (a) and (b) (i.e. to start anew GOP at the beginning of a new scene, a maximum and a minimum size ofsaid GOP being respected). In order to solve this optimization problem,a Viterbi algorithm is used : for each path the diversion from thepreferred size of the GOP is penalized whereas the inclusion of aprobable scene change at the start of a GOP is rewarded, the cumulativesum of all decision values determining the path which is chosen for eachpicture.

[0065] This algorithm finds the optimal start positions of the GOPs overthe sequence. Every picture has an attached scene change decision valuewhich describes the probability of a scene cut at the respectiveposition: if the decision value is big, it means that there is a highprobability for a scene change at that position, and it is thereforeprofitable to allocate a new GOP. However, as the size of the GOPs haveto be neither too small or too big, the transitions between the GOPstart points (i.e. the size of the GOPs) are also weighted.

[0066] In FIG. 11, the vertical lines represent pictures, the parametersS₁ describe the scene change probabilities of the respective picture (itis assumed that only one scene change occurs for a group of threepictures, hence only one decision value D_(i)(N) will exist for each PBBor IBB block), and the W parameters are the weighting values whichevaluate the size of the GOP (sizes close to an optimum size, such as12, being preferred). The decision value of a path which ends at pictureis then computed as follows:

D ₁(N)=C _(i−N) +W(N)  (9)

[0067] with N being the size of the considered GOP and C_(1−N) being thesum of all weights S and W for the optimum allocation of GOPs frompicture “1” up to picture “i−N”. The GOP size is chosen which has thehighest decision value. The weighting function W=f(N) is a quadraticone, and W therefore decreases proportionally with the squareddifference between the GOP size and the preferred GOP size, such asindicated in FIG. 12 illustrating an example of weighting function forthe GOP size (the proposed weight function causes all GOPs between twoscene changes to have approximately the same size: hence, if two scenechanges have a distance of 16 pictures, two GOPs with a size of 8 areallocated rather than one GOP with the size of 10 and another with thesize of 6).

[0068] Up to now, it has however not be considered that there is adifference between the transmission order and the display order ofpictures. If one considers that the start of GOP is allocated at thefirst picture after a scene change, then the first group of M picturesin the GOP starts at the scene cut, too. However, the I picture is thelast picture of the block to be displayed. If M=3, the first twopictures of the GOP are coded as B pictures and only the third one is anI picture. Therefore the start of the GOPs can be shifted by one or twopositions to the left in order to guarantee that the first picture aftera scene change is really an I picture and not a B one.

[0069] The third sub-step 130 is a P and B picture allocation sub-step.In view of an optimization of the time-dependent parameter M, to searchin an adaptive manner for the best place of B and P pictures indeedallows a minimization of the bitrate needed for the coding of thesequence. Increasing the value of M increases the bitrate of P pictures,but more bitrate efficient B pictures are used instead of P pictures.The correlation between succeeding pictures is therefore the mostimportant parameter for the optimization, which will be in factsubdivided into two sub-tasks:

[0070] (a) a long-term optimization, in order to find the optimum M overseveral GOPs

[0071] (b) a short-term optimization, in order to find the best place ofB and P pictures inside a GOP while taking into account the localvariations of the correlation between pictures.

[0072] With respect to the long-term optimization, it must be notedthat, if the correlation coefficient between successive pictures tendstoward one, it does not matter whether a B picture or a P picture ischosen since almost no coefficient bits remain in any case, while motioncompensation does not work if said correlation is very low. In theseextreme cases (respectively a standing image and uncorrelated pictures),it is not obvious which M is to be preferred. In the other cases, it isgenerally possible to say that a small M performs well for a lowcorrelated sequence and that a big M is better for a sequence with highcorrelation. The best results for the long-term optimization of M areobtained if experiments are performed over a large number of scenes.

[0073] With respect to the short-term optimization, it may be added thatM can arbitrarily vary inside each GOP, which makes it possible to useshort-term variations of the correlation between pictures in order tominimize the bitrate. An example for short-term optimization of M isgiven in the following table, indicating the choice of M before a scenechange: SCENE 1 SCENE 2 POSITION 1 2 3 4 M = 1 P P P I M = 2 P P B (likeP) I M = 3 P B (like P) B I

[0074] Obviously, the B pictures before the scene change can only beforward predicted. It does not make a big difference whether M=1 or 2before the new scene, because the B picture before the scene changebehaves like a P picture; a choice of M=3 is clearly worse because the Bpicture at position 3 uses a reference picture (the preceding P picture)at position 1, hence at a distance of two positions. Since thecorrelation between pictures decreases as their distance from each otherincreases, the bitrate of the B picture at position 3 is higher for M =3than the bitrates of the pictures at the same position for M=1 or 2.

[0075] The third step 43 is a prediction one, intended to predict thequantization stepsize Q which must be used to code the bitstreamaccording to the specific target bitrate. Once said prediction step iscompleted, the analysis step 41 may be repeated (arrow in FIG. 4) asoften as necessary in order to get a more precise estimation for Q(however, a good prediction is generally obtained after a few runs, forinstance two).

[0076] As the quantization stepsize Q available at the end of thissecond step is only an estimated value, the total bit budget is notexactly matched if every picture is coded by using said predicted value.A final step 44 is provided that allows to guarantee that the constrainton the total average bit rate is strictly observed. To ensure that thefinal output bitstream has indeed exactly the desired size, aquantization stepsize control process is implemented. This process isbased on a control loop relying on a comparison of predicted and realbit rates. After the coding of each picture in the final step, thecontrol process compares the total number of bits that have been spentwith the allowed one. If more bits have been spent than the budgetallows, the quantization stepsize is increased, and the bit rate of thefollowing pictures is reduced. If fewer bits have been spent than thebudget allows, Q is decreased and the bit rate is increased, the totaltarget bit rate being finally exactly matched.

[0077] Said VBR coding method may be implemented in a coder having anarchitecture such as shown in FIG. 5, where each block corresponds to aparticular function that is performed under the supervision of acontroller 55. The illustrated coder comprises in series an input buffer51, a subtractor 549, a DCT circuit 521, a quantization circuit 522, avariable length coding circuit 523, and an output buffer 524. Thecircuits 521 to 524 constitute the main elements of a coding branch 52,to which a prediction branch 53, including an inverse quantizationcircuit 531, an inverse DCT circuit 532 and a prediction sub-system, isassociated. This prediction sub-system itself comprises an adder 541, abuffer 542, a motion estimation circuit 543 (said estimation is based onan analysis of the input signals available at the output of the buffer51), a motion compensation circuit 544 (the output signals of which aresent backwards to the second input of the adder 541), and the subtracter549 (receiving the output signals of the buffer 51 and the outputsignals of the motion compensation circuit 544, for sending thedifference of said signals towards the coding branch).

[0078] The output of the illustrated coder is sent towards thecontroller 55 that includes the control loop provided to carry out thefinal step 44. The main elements of said control loop for the final passof the VBR coder are shown in FIG. 13. As already explained, it isnecessary to adjust the quantization stepsize during this final codingpass, in order to ensure that the total target bit rate given by theoperator is exactly matched. Said loop first comprises a firstcomputation circuit 131 in which the output of the loop (i.e. thecumulative prediction error) is multiplied by a factor KP. This factoris itself equal to a constant QC₁ (chosen by the operator) multiplied bya weighting factor Q_(int)/APG, where Q_(int) is an integrativeestimation of Q and APG the total number of bits for a GOP (of Npictures).

[0079] An adder 133 then adds the output Q-prop of said circuit 131 andthe signal Q_(int), available at the output of a second computationcircuit 132 provided for yielding an integrative estimation of Q. Aconversion circuit 134 gives the cumulative bitrate for all precedingpictures, on the basis of a relation R=f₁(Q) (between the quantizationfactor Q at the output of the adder 63 and the bitrate R) stored in saidcircuit 134. The cumulative bitrate thus obtained is compared in acomparator 135 with the cumulative predicted bitrate available on asecond input of said comparator and is used, after an integration in acircuit 136, in order to modify Q accordingly.

[0080] The VBR coding strategy as presented above is an improvement withrespect to previous VBR coders because it achieves a better equalizationof the perceptive quality of the decoded sequence. The classical VBRcoders adjust the quantization parameter Q while coding a picture, sothat the predicted bit rate is matched for every picture. Hence theyallow the quantization parameter Q to vary inside a picture, and noconstant spatial quality of the picture can be achieved. This variationin quality occurs whether the bit rate of the picture is correctlypredicted or not. For the proposed VBR coder, Q is kept constant over apicture and the spatial quality of any picture in the video sequencedoes not vary. If the picture bit rates and the quantization stepwidthare correctly estimated, the Q before adaptive quantization, andtherewith the subjective distortion, remains exactly constant for allmacroblocks of the sequence. Since the quantization stepwidth and thepicture bit rates are only estimated, a variation of Q, and hence of thequality of the sequence, occurs from picture to picture, but, after someanalysis passes, the deviations of Q, averaged over a picture, aregenerally below 1%.

[0081] Apart from the attainment of a constant intra-picture quality,several other important aspects of the new VBR strategy may bementioned:

[0082] it is possible to improve the prediction of the quantizationfactor in an iterative way by increasing the number of analysis passesif, after the analysis run, the deviation from the wanted target bitrate is still too high, a better estimation for the quantization factorcan be calculated using the results from the previous coding passes;

[0083] as the new VBR coding strategy predicts Q, analysis passes thatare performed with another picture order than the predicted pass can beexploited this is impossible using old strategies, and this is a majoradvantage of the new coding concept;

[0084] if, in the final pass, the variations of Q and consequently ofthe quality turn out to be unacceptably high, the final step can be usedas an analysis pass for the prediction of Q and of the bitrate for thesubsequent pass: using this feature, it is possible to develop a coderthat performs as many coding passes as needed until the characteristicsof the output bitstream are within certain limits defined by theoperator;

[0085] as the control loop has an integrative character, short-term bitrate prediction errors cancel each other out: therefore, systematic,picture-type dependent prediction errors do not seriously affect theperformance of the proposed VBR coder.

[0086] The invention is obviously not limited to the embodimentdescribed hereinbefore, from which variations or improvements may beconceived without departing from the scope of said invention. Forinstance, an optional fourth sub-step, referenced 140 in FIG. 4 andshown with connections in dotted lines, may be included into there-ordering step 42, as now explained. In order to code a sequenceexactly at a given bitrate R(t) in the final pass of the last step 44,it is indeed necessary to predict a target quantization stepwidth Q andthe target picture bitrates R(i). For the execution of an analysis pass,the only requirement is to have a prediction of Q. As no control systemfor Q is used during the first analysis step 41, no prediction of thepicture targets is necessary. For the estimation of the bitrates R(i)and the stepwidth Q, the quantization factor and the picture bitrates ofthe previous coding pass are needed. However, if the order of thepicture types is changed between the two passes, the same picture of asequence may be coded by two different picture types in said two passes.

[0087] If one considers for example that the analysis pass was performedusing N=12 and M=3, whereas the predicted pass is coded with N=8 andM=2, the corresponding picture allocations are shown in the followingtable: N M Picture display order 12 3 BBIBBPBBPBBPBBIB  8 2BIBPBPBPBIBPBBBP

[0088] where the second picture is coded as a B picture in the firstpass and as an I picture in the second pass. As the bitrate predictionis provided for predicting the target bitrate of a picture which has thesame type as the picture in the first analysis pass, if a picture wascoded as a B picture in said first pass, the bitrate of a B picture ishence predicted for the second pass. In case of a modified pictureorder, the predicted picture bitrates are therefore useless.

[0089] As the prediction of the target picture bitrates is not possibleafter a picture re-arrangement, the final coding pass cannot beperformed directly after the picture re-ordering. A second analysis passmust therefore be carried out before said final coding pass: hence atleast three coding passes are needed in that case for the VBR coderaccording to the invention. In order to guarantee that the predictedpicture bitrates for the final pass are not too wrong, an“inter-picture” prediction additional sub-step may therefore beprovided, which estimates the bitrates that the pictures would have ifthe analysis pass had been performed with the new picture order insteadof the old one. This additional sub-step 140, which is, as already said,optional, exploits the temporal correlation of picture bitrates.

1. A variable bitrate video coding method including an iterative process that comprises a first analysis step, for coding a bitstream corresponding to a picture sequence with a constant quantization stepsize, and a second prediction step, for predicting the quantization stepsize which must then be used to code said bitstream according to a predetermined target bit rate, and is followed by a final control step, for adjusting the stepsize with respect to said target bit rate, said method being characterized in that it comprises, between the analysis and prediction steps of one iteration, a picture re-arrangement step.
 2. A method according to claim 1 , characterized in that said picture re-arrangement step itself comprises in series a first scene change detection sub-step, a second allocation sub-step, and a third optimum placement sub-step.
 3. A method according to claim 2 , characterized in that said scene change detection sub-step comprises a correlation operation, carried out between suceeding pictures of the sequence, and a decision operation, for indicating the possible occurrence of a scene change.
 4. A method according to claim 3 , characterized in that said correlation operation is based on a picture complexity estimation.
 5. A method according to claim 3 , characterized in that said correlation operation is based on the comparison of the blocks of a picture with reference blocks in a previous reference picture.
 6. A method according to claim 2 , characterized in that said allocation sub-step is based on the implementation of a Viterbi algorithm allowing to select a preferred size for successive groups of pictures while matching a scene change with the start of such a group of pictures.
 7. A method according to anyone of claims 2 to 6 , characterized in that said optimum placement sub-step comprises a first long-term optimization operation, for finding over several groups of pictures the optimum spacing between the pictures of these groups, and a second short-term optimization operation, for finding inside a group of pictures the best places of predicted and interpolated pictures.
 8. A method according to anyone of claims 2 to 7 , characterized in that an additional inter-picture prediction step is provided in case of change of the order of the picture types between two successive iterations.
 9. A variable bit rate video coder comprising a first coding branch, a prediction branch, and a control circuit provided for carrying out the implementation of the following operations: a coding operation of a bitstream corresponding to a picture sequence, with a constant quantization stepsize; a prediction operation, for an estimation of the quantization stepsize allowing to code said bitstream according to a specified target bitrate; at least one repetition of said operations; a final controlling operation, for adjusting the stepsize with respect to said target bitrate; characterized in that said control circuit is provided for implementing, between the first coding operation and the first prediction operation, a picture re-arrangement operation. 